Starting the Analysis Cluster

NEXUS utilizes Apache Spark running on Apache Mesos for its analytical functions. Now that the infrastructure has been started, we can start up the analysis cluster.

The analysis cluster consists of and Apache Mesos cluster and the NEXUS webapp Tornado server. The Mesos cluster we will be bringing up has one master node and three agent nodes. Apache Spark is already installed and configured on the three agent nodes and will act as Spark executors for the NEXUS analytic functions.

Step 1: Start the Containers

We can use docker-compose again to start our containers.

TODO

  1. Navigate to the directory containing the docker-compose.yml file for the analysis cluster

    $ cd ~/nexus/esip-workshop/docker/analysis
    
  2. Use docker-compose to bring up the containers in the analysis cluster

    $ docker-compose up -d
    

Step 2: Verify the Cluster is Working

Now that the cluster has started we can use various commands to ensure that it is operational and monitor its status.

TODO

  1. List all running docker containers.

    $ docker ps
    

    The output should look simillar to this:

    CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS              PORTS                                            NAMES
    e5589456a78a        nexusjpl/nexus-webapp         "/tmp/docker-entry..."   5 seconds ago       Up 5 seconds        0.0.0.0:4040->4040/tcp, 0.0.0.0:8083->8083/tcp   nexus-webapp
    18e682b9af0e        nexusjpl/spark-mesos-agent    "/tmp/docker-entry..."   7 seconds ago       Up 5 seconds                                                         mesos-agent1
    8951841d1da6        nexusjpl/spark-mesos-agent    "/tmp/docker-entry..."   7 seconds ago       Up 6 seconds                                                         mesos-agent3
    c0240926a4a2        nexusjpl/spark-mesos-agent    "/tmp/docker-entry..."   7 seconds ago       Up 6 seconds                                                         mesos-agent2
    c97ad268833f        nexusjpl/spark-mesos-master   "/bin/bash -c './b..."   7 seconds ago       Up 7 seconds        0.0.0.0:5050->5050/tcp                           mesos-master
    90d370eb3a4e        nexusjpl/jupyter              "tini -- start-not..."   2 days ago          Up 2 days           0.0.0.0:8000->8888/tcp                           jupyter
    cd0f47fe303d        nexusjpl/nexus-solr           "docker-entrypoint..."   2 days ago          Up 2 days           8983/tcp                                         solr2
    8c0f5c8eeb45        nexusjpl/nexus-solr           "docker-entrypoint..."   2 days ago          Up 2 days           8983/tcp                                         solr3
    27e34d14c16e        nexusjpl/nexus-solr           "docker-entrypoint..."   2 days ago          Up 2 days           8983/tcp                                         solr1
    247f807cb5ec        cassandra:2.2.8               "/docker-entrypoin..."   2 days ago          Up 2 days           7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra3
    09cc86a27321        zookeeper                     "/docker-entrypoin..."   2 days ago          Up 2 days           2181/tcp, 2888/tcp, 3888/tcp                     zk1
    33e9d9b1b745        zookeeper                     "/docker-entrypoin..."   2 days ago          Up 2 days           2181/tcp, 2888/tcp, 3888/tcp                     zk3
    dd29e4d09124        cassandra:2.2.8               "/docker-entrypoin..."   2 days ago          Up 2 days           7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra2
    11e57e0c972f        zookeeper                     "/docker-entrypoin..."   2 days ago          Up 2 days           2181/tcp, 2888/tcp, 3888/tcp                     zk2
    2292803d942d        cassandra:2.2.8               "/docker-entrypoin..."   2 days ago          Up 2 days           7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra1
    
  2. List the available Mesos slaves by running the cell below.


In [ ]:
# TODO Run this cell to see the status of the Mesos slaves. You should see 3 slaves connected.

import requests
import json

response = requests.get('http://mesos-master:5050/state.json')
print(json.dumps(response.json()['slaves'], indent=2))

Step 3: List available Datasets

Now that the cluster is up, we can investigate the datasets available. Use the nexuscli module to list available datatsets.

TODO

  1. Get a list of datasets by using the nexuscli module to issue a request to the nexus-webapp container that was just started.

In [ ]:
import nexuscli

nexuscli.set_target("http://nexus-webapp:8083")
nexuscli.dataset_list()

Step 4: Run a Time Series

Verify the analysis functions are working by running a simple Time Series.

TODO

  1. Run the cell below to produce a time series plot using the analysis cluster you just started.

In [ ]:
# TODO Run this cell to produce a Time Series plot using AVHRR data.
%matplotlib inline
import matplotlib.pyplot as plt
import time
import nexuscli
from datetime import datetime

from shapely.geometry import box

bbox = box(-150, 40, -120, 55)
datasets = ["AVHRR_OI_L4_GHRSST_NCEI"]
start_time = datetime(2013, 1, 1)
end_time = datetime(2013, 12, 31)

start = time.perf_counter()
ts, = nexuscli.time_series(datasets, bbox, start_time, end_time, spark=True)
print("Time Series took {} seconds to generate".format(time.perf_counter() - start))

plt.figure(figsize=(10,5), dpi=100)
plt.plot(ts.time, ts.mean, 'b-', marker='|', markersize=2.0, mfc='b')
plt.grid(b=True, which='major', color='k', linestyle='-')
plt.xlabel("Time")
plt.ylabel ("Sea Surface Temperature (C)")
plt.show()

Step 5: Check the Results of the Spark Job

The time series function in the previous cell will run on the Spark cluster. It is possible to use the Spark RESTful interface to determine the status of the Spark job.

TODO

  1. Run the cell below to see the status of the Spark Job.

In [ ]:
# TODO Run this cell. You should see at least one successful Time Series Spark job.
import requests

response = requests.get('http://nexus-webapp:4040/api/v1/applications')
appId = response.json()[0]['id']
response = requests.get("http://nexus-webapp:4040/api/v1/applications/%s/jobs" % appId)
for job in response.json():
    print(job['name'])
    print('\t' + job['status'])

Congratulations!

You have successfully started a NEXUS analysis cluster and verified that it is functional. Your EC2 instance is now running both the infrastructure and the analysis cluster: